Goto

Collaborating Authors

 form game



Fast Convergence of Optimistic Gradient Ascent in Network Zero-Sum Extensive Form Games

Piliouras, Georgios, Ratliff, Lillian, Sim, Ryann, Skoulakis, Stratis

arXiv.org Artificial Intelligence

The study of learning in games has thus far focused primarily on normal form games. In contrast, our understanding of learning in extensive form games (EFGs) and particularly in EFGs with many agents lags far behind, despite them being closer in nature to many real world applications. We consider the natural class of Network Zero-Sum Extensive Form Games, which combines the global zero-sum property of agent payoffs, the efficient representation of graphical games as well the expressive power of EFGs. We examine the convergence properties of Optimistic Gradient Ascent (OGA) in these games. We prove that the time-average behavior of such online learning dynamics exhibits $O(1/T)$ rate convergence to the set of Nash Equilibria. Moreover, we show that the day-to-day behavior also converges to Nash with rate $O(c^{-t})$ for some game-dependent constant $c>0$.


Game Theory In Artificial Intelligence

#artificialintelligence

I want to start off with a quick question – can you recognize the two personalities in the below image? I'm certain you got one right. For most of us early age math enthusiasts, the movie "A Beautiful Mind" is inextricably embedded into our memory. Russell Crowe plays the role of John Nash in the movie, a Nobel prize winner for economics (and the person on the left-hand side above). Now, you would remember the iconic scene often regarded as: "Don't go after the blonde". "….the best outcome would come when everyone in the group is doing what's best for himself and the group."


Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

McAleer, Stephen, Lanier, John, Fox, Roy, Baldi, Pierre

arXiv.org Artificial Intelligence

Finding approximate Nash equilibria in zero-sum imperfect-information games is challenging when the number of information states is large. Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm grounded in game theory that is guaranteed to converge to an approximate Nash equilibrium. However, PSRO requires training a reinforcement learning policy at each iteration, making it too slow for large games. We show through counterexamples and experiments that DCH and Rectified PSRO, two existing approaches to scaling up PSRO, fail to converge even in small games. We introduce Pipeline PSRO (P2SRO), the first scalable general method for finding approximate Nash equilibria in large zero-sum imperfect-information games. P2SRO is able to parallelize PSRO with convergence guarantees by maintaining a hierarchical pipeline of reinforcement learning workers, each training against the policies generated by lower levels in the hierarchy. We show that unlike existing methods, P2SRO converges to an approximate Nash equilibrium, and does so faster as the number of parallel workers increases, across a variety of imperfect information games. We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of $10^{50}$. P2SRO is able to achieve state-of-the-art performance on Barrage Stratego and beats all existing bots.


Large Scale Learning of Agent Rationality in Two-Player Zero-Sum Games

Ling, Chun Kai, Fang, Fei, Kolter, J. Zico

arXiv.org Artificial Intelligence

With the recent advances in solving large, zero-sum extensive form games, there is a growing interest in the inverse problem of inferring underlying game parameters given only access to agent actions. Although a recent work provides a powerful differentiable end-to-end learning frameworks which embed a game solver within a deep-learning framework, allowing unknown game parameters to be learned via backpropagation, this framework faces significant limitations when applied to boundedly rational human agents and large scale problems, leading to poor practicality. In this paper, we address these limitations and propose a framework that is applicable for more practical settings. First, seeking to learn the rationality of human agents in complex two-player zero-sum games, we draw upon well-known ideas in decision theory to obtain a concise and interpretable agent behavior model, and derive solvers and gradients for end-to-end learning. Second, to scale up to large, real-world scenarios, we propose an efficient first-order primal-dual method which exploits the structure of extensive-form games, yielding significantly faster computation for both game solving and gradient computation. When tested on randomly generated games, we report speedups of orders of magnitude over previous approaches. We also demonstrate the effectiveness of our model on both real-world one-player settings and synthetic data.


What game are we playing? End-to-end learning in normal and extensive form games

Ling, Chun Kai, Fang, Fei, Kolter, J. Zico

arXiv.org Machine Learning

Although recent work in AI has made great progress in solving large, zero-sum, extensive-form games, the underlying assumption in most past work is that the parameters of the game itself are known to the agents. This paper deals with the relatively under-explored but equally important "inverse" setting, where the parameters of the underlying game are not known to all agents, but must be learned through observations. We propose a differentiable, end-to-end learning framework for addressing this task. In particular, we consider a regularized version of the game, equivalent to a particular form of quantal response equilibrium, and develop 1) a primal-dual Newton method for finding such equilibrium points in both normal and extensive form games; and 2) a backpropagation method that lets us analytically compute gradients of all relevant game parameters through the solution itself. This ultimately lets us learn the game by training in an end-to-end fashion, effectively by integrating a "differentiable game solver" into the loop of larger deep network architectures. We demonstrate the effectiveness of the learning method in several settings including poker and security game tasks.


Abstraction Using Analysis of Subgames

Basak, Anjon (The University of Texas at El Paso)

AAAI Conferences

Normal form games are one of the most familiar representations for modeling interactions among multiple agent. However, modeling many realistic interactions between agents results in games that are extremely large. In these cases computing standard solutions like Nash equilibrium may be intractable. To overcome this issue the idea of abstraction has been investigated, most prominently in research on computer Poker. Solving a game using abstraction requires using some method to simplify the game before it is analyzed. We study a new variation for solving normal form games using abstraction that is based on finding and solving suitable sub games. We compare this method with several variations of a common type of abstraction based on clustering similar strategies.


Bounding the Support Size in Extensive Form Games with Imperfect Information

Schmid, Martin (Charles University in Prague) | Moravcik, Matej (Charles University in Prague) | Hladik, Milan (Charles University in Prague)

AAAI Conferences

It is a well known fact that in extensive form games with perfect information, there is a Nash equilibrium with support of size one. This doesn't hold for games with imperfect information, where the size of minimal support can be larger. We present a dependency between the level of uncertainty and the minimum support size. For many games, there is a big disproportion between the game uncertainty and the number of actions available. In Bayesian extensive games with perfect information, the only uncertainty is about the type of players. In card games, the uncertainty comes from dealing the deck. In these games, we can significantly reduce the support size. Our result applies to general-sum extensive form games with any finite number of players.


Learning to Cooperate in Normal Form Games

Damer, Steven (University of Minnesota) | Gini, Maria (University of Minnesota)

AAAI Conferences

We study the problem of achieving cooperation between two self-interested agents that play a sequence of randomly generated normal form games, each game played only once. To achieve cooperation we extend a model used to explain cooperative behavior by humans. We show how a modification of a pre-regularized particle filter can be used to detect the cooperation level of the opponent and play accordingly. We examine how properties of the games affect the ability of an agent to detect cooperation and explore the effects of different environments and different levels of conflict. We present results obtained in simulation on hundreds of randomly generated games.